PArallel RAndomly COMPressed Cubes (PARACOMP): A Scalable Distributed Architecture for Big Tensor Decomposition
نویسندگان
چکیده
This article combines a tutorial on state-of-art tensor decomposition as it relates to big data analytics, with original research on parallel and distributed computation of low-rank decomposition for big tensors, and a concise primer on Hadoop-MapReduce. A novel architecture for parallel and distributed computation of low-rank tensor decomposition that is especially well-suited for big tensors is proposed. The new architecture is based on parallel processing of a set of randomly compressed, reduced-size ‘replicas’ of the big tensor. Each replica is independently decomposed, and the results are joined via a master linear equation per tensor mode. The approach enables massive parallelism with guaranteed identifiability properties: if the big tensor is indeed of low rank and the system parameters are appropriately chosen, then the rank-one factors of the big tensor will indeed be recovered from the analysis of the reduced-size replicas. Furthermore, the architecture affords memory / storage and complexity gains of order IJ F for a big tensor of size I × J × K of rank F with F ≤ I ≤ J ≤ K. No sparsity is required in the tensor or the underlying latent factors, although such sparsity can be exploited to improve memory, storage and computational savings.
منابع مشابه
Tensors in Power System Computation I: Distributed Computation for Optimal Power Flow, DC OPF
Tensor decomposition plays a key role in identifying common features across a collection of matrices in many areas of science. A fundamental need in big data research is to process data tabulated as large-scale matrices using eigenvectors. A higher-order generalized singular value decomposition technique successfully captures the common features of the same organ from multiple animals in genomi...
متن کاملDynamic configuration and collaborative scheduling in supply chains based on scalable multi-agent architecture
Due to diversified and frequently changing demands from customers, technological advances and global competition, manufacturers rely on collaboration with their business partners to share costs, risks and expertise. How to take advantage of advancement of technologies to effectively support operations and create competitive advantage is critical for manufacturers to survive. To respond to these...
متن کاملA Parallel Solver for Singular Integrals
A parallel version of a fast algorithm for singular integral transforms [6] is presented. The parallel version only utilizes a linear neighborto-neighbor communication path which makes the algorithm very scalable and suitable for any distributed memory architecture.
متن کاملParallel and Distributed Systems for Probabilistic Reasoning
Scalable probabilistic reasoning is the key to unlocking the full potential of the age of big data. From untangling the biological processes that govern cancer to effectively targeting products and advertisements, probabilistic reasoning is how we make sense of noisy data and turn information into understanding and action. Unfortunately, the algorithms and tools for sophisticated structured pro...
متن کاملThesis Parallel and Distributed Systems for Probabilistic Reasoning
Scalable probabilistic reasoning is the key to unlocking the full potential of the age of big data. From untangling the biological processes that govern cancer to effectively targeting products and advertisements, probabilistic reasoning is how we make sense of noisy data and turn information into understanding and action. Unfortunately, the algorithms and tools for sophisticated structured pro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014